Files to consider

13GB Crime (Use sample to build) LSOA_popv2 -> Geo + city LSOA(2011)_toLOSA -> Geo + city location

Big Data Product: Weapons and Drugs

In the television documentary “Ross Kemp and the Armed Police” broadcast 6th September 2018 by ITV, multiple claims were made regarding violent crime in the UK.

These claims were that:

  1. Violent Crime is increasing
  2. There are more firearms incidents per head in Birmingham than anywhere else in the UK
  3. Crimes involving firearms are closely associated with drugs offences

To solve this problem, you will use publicly available data sets that have been prepared for you and placed online. These include (but are not limited to):-

  1. Street Level Crime Data published by the UK Home Office, this dataset contains 19 million data rows giving a crime type, together with their location as a latitude and longitude.
  2. English Indices of Deprivation Data: The English Indices of Deprivation 2010 data set contains the rankings of measures of deprivation within small area level across England. The 32000 localities are ranked from the least to most deprived, scored on seven different dimensions of deprivation.

Assignment Specifics

  1. Process the given data efficiently using Apache Spark on a cloud Infrastructure as a Service (IaaS) platform. A sample Jupyter Notebook has been provided on Blackboard.
  2. Filter the dataset so that only relevant crimes are included.
  3. Using appropriate techniques, determine whether Violent Crimes are increasing, decreasing, or are stable.
  4. Determine whether there are more firearms incidents per head in Birmingham than anywhere else in the UK. Possession of firearms carries a mandatory prison sentence in the UK. Therefore, you may assume that a crime type of “Possession of weapons” whose outcome is “offender sent to prison” was a firearm incident.
  5. Using appropriate techniques, determine whether firearms incidents are associated with drugs offences.
  6. Select and prepare no more than four visualizations to support your analytic findings from (3).
  7. Explain the reasoning behind your code so that it is clear what each block is intended to achieve (i.e., appropriately comment the command line).
  8. Assess the three claims given and determine whether they are true, false, or cannot be determined.
  9. Critically assess and report on the advantages, disadvantages, and limitations of the methods used.
  10. Your submission will be a Jupyter Notebook containing both code (typically Python), and explanatory text (in Markdown format) limited to 2500 words (plus references).

1. Introduction

The Crime Analysis task and Approach taken to the problem

Crimes in general is a huge problem for society, peace of mind, locality and affects various parts of life for many people.

Introduction

  • Drugs were initially made for medical purposes. They have short & long term effects in the body both physically & mentally.
  • Criminals started selling drugs illegal to civilians turning them into addicts making them recurring buyer.
  • Since drugs are illegal, people pay extra money to the dealer / criminal as a fee for security & shipping.

  • Weapons are illegal to own in various countries.

  • People who generally own weapons are cops / securities / person with proper permission under various restrictions.
  • When crimes involve both weapons and drugs, they are always nuisance to innocents who caught in the act, public & society.

Approach taken

  • Using Big data processing tool (PySpark) running in cloud infrastructure IAAS (like AWS ec2) or PAAS (like AWS Glue or Azure Data Factory)
  • The analysis is to study crimes happening in United Kingom (UK) between time period 2010 to 2020
  • Using various statistical & visualization techniques we study trends and assert various claims by media.
  • How are we doing the analysis?
    1. Use PySpark to process & crunch the data for better handling of raw data (~19MM rows)
    2. Look at various visual lenses to translate the data into into much more digestable
    3. Apply statistical methods to help to test various hypothesis & claims

Difference beween IAAS and PAAS

Index Topic IAAS PAAS
0 Where it runs? Runs in any cloud computer or virtual machine or computer Underlying architecture is already handled by
1 Who is going to Setup? User needs to setup with packages required Setup is already done
2 Customizability Very modular & highly customizable Less customizability
3 Is it easy to setup? Setting up requires time Does not require much time like IAAS
4 Can I interact & debug? Run a server & work in jupyter notebook, helps run things interactively & debug each step Debugging option is highly reduced. All the visuals generated needs to be saved before viewing
5 Examples AWS EC2, Azure Virtual Machines AWS Glue job, Azure Data Factory

2. Component Selection and Data Pipeline Implementation

'\nList of installed packages\n'

SparkSession - hive

SparkContext

Spark UI

Version
v3.3.1
Master
local[*]
AppName
Spark

3. Data Extraction and Filtering System running, test and diagnostics

Given Specifics:

  1. Filter the dataset so that only relevant crimes are included.

4. Design, Development and reasoning behind use of multiple visualization methods, statistics, and machine learning Models

  1. Using appropriate techniques, determine whether Violent Crimes are increasing, decreasing, or are stable.
  2. Determine whether there are more firearms incidents per head in Birmingham than anywhere else in the UK. Possession of firearms carries a mandatory prison sentence in the UK. Therefore, you may assume that a crime type of “Possession of weapons” whose outcome is “offender sent to prison” was a firearm incident.
  3. Using appropriate techniques, determine whether firearms incidents are associated with drugs offences.
  4. Select and prepare no more than four visualizations to support your analytic findings from (3).

General View

Crimes across Year-Month

Observation:

* Overall crimes in UK have a cyclic pattern with peak in mid year
* Overall crime total decreased during 2014 to 2016 comparing to rest of the years
20122014201620182020400k450k500k550k600k
Fig 1. All Crimes between 2010 to 2021MONTHcount
plotly-logomark

Observation:

  • Other Crimes initally seems to be miscategorized
  • Violence and sexual offences is increasing over time
  • Anti-soical behaviour trend keeps decreasing till 2019 and suddenly increases in 2020. Probably due to outrise in corona
  • Drugs have a constant trend over time
  • Violent Crime has a declining trend. Data is not available after Apr-2013
  • Pattern of Shoplifting, Burglary, Vehicle crime & Criminal damange and arson are similar
  • If Violent Crime and Violence and sexual offences are combined together, there is an upward trend. As of 2021, Crimes increased 3x times of 2013
20122014201620182020050k100k150k200k250k
CRIME_TYPEAnti-social behaviourOther crimeViolent crimeBurglaryVehicle crimeRobberyOther theftCriminal damage and arsonShopliftingDrugsPublic disorder and weaponsViolence and sexual offencesPublic orderBicycle theftTheft from the personPossession of weaponsFig 2. Trend of All Crime TypesMONTHcount
plotly-logomark

UK Geograpy - Density Heatmap Observations

  • Crimes are more in Counties
  • Crimes are highly dense in London & Birmingham
Make this Notebook Trusted to load map: File -> Trust Notebook
Make this Notebook Trusted to load map: File -> Trust Notebook

Finding Crime that relate to Weapon & Drugs

Checking Various Crimes Types

Grouping crimes types together and taking count of crimes

Observations:

  • For weapons, following types can be used: Possession of weapon, public disorder and weapons
  • Drug crimes is bigger than weapon crimes
  • Anti-social behaviour is 2x of Violence and sexual offences
Anti-social behaviourViolence and sexual offencesCriminal damage and arsonOther theftBurglaryVehicle crimeShopliftingOther crimePublic orderDrugsViolent crimeBicycle theftTheft from the personRobberyPossession of weaponsPublic disorder and weapons
Fig 5. Proportion View Of Crimes
plotly-logomark
Anti-social behaviourViolence and sexual offencesCriminal damage and arsonOther theftBurglaryVehicle crimeShopliftingOther crimePublic orderDrugsViolent crimeBicycle theftTheft from the personRobberyPossession of weaponsPublic disorder and weapons05M10M15M20M
Fig 6. Number of Crimes Cases by TypeCRIME_TYPEcount
plotly-logomark
CRIME_TYPE count CRIME_%
0 Anti-social behaviour 20211528 31.06
1 Violence and sexual offences 11411540 17.54
2 Criminal damage and arson 5343182 8.21
3 Other theft 5206259 8.00
4 Burglary 4350252 6.68
5 Vehicle crime 4170769 6.41
6 Shoplifting 3227557 4.96
7 Other crime 2565111 3.94
8 Public order 2564695 3.94
9 Drugs 1682486 2.59
10 Violent crime 1673219 2.57
11 Bicycle theft 735689 1.13
12 Theft from the person 714621 1.10
13 Robbery 696008 1.07
14 Possession of weapons 283189 0.44
15 Public disorder and weapons 242145 0.37

5. Selection, application, and reasoning behind use of statistical analysis and multiple evaluation measures

Media Claim #1 " Violent Crime is increasing "

To prove the claim,

  1. Filter for CRIME_TYPE='Violent crime' or 'Violence and sexual offences'
  2. Look at trend over time
  3. Additional insight: Look at trend over time by LSOA Area
Claim #1. Violent Crimes are Increasing
Test 1: Check if Violent Crimes is Increasing
2012201420162018202060k80k100k120k140k160k180k
Fig 7. Claim #1. Violent Crimes is IncreasingMONTHcount
plotly-logomark
Test 2: Check if Violent Crimes is Increasing by LSOA Area
201220142016201820200500100015002000250030003500
AREA_NAME=LeedsLeedsCentral BedfordshireCastle PointChristchurchFig 8. Claim #1. Violent Crimes is Increasing By LSOA AreaMONTHcount▶◼
plotly-logomark

Media Claim #2 "There are more firearms incidents per head in Birmingham than anywhere else in the UK"

To prove the claim, preprocessing is required

  1. Make firearm dataset: Filter for CRIME_TYPE = 'Possession of weapons'
  2. Aggregate population by Area name
  3. Aggregate Crimes by Area name
  4. Join step 2 & step 3 by Area name
  5. Calculate ratio between FIRE_ARM_PER_HEAD= #of firearm cases / #of population in LSOA area#of firearm cases / #of population in LSOA area and PCT (Percentage)
  6. Create rank index based on descending order of ratio. (Lower rank or higher ratio = more fireamrs per head)
  7. Do a trend chart to between Birmingham and top 20 rank
  8. Check Birmingham rank over time

Observations:

  • Based on number of crime reports across years, Birmingham is 2nd
  • But number of fire arm incidents per person, Birmingham is not the leader

_Note: FIRE_ARM_PER_HEAD_RANK will be equal to FIRE_ARM_PER_HEAD_PCTRANK rank because PCT is multiplied by 100 .

Claim #2. There are more firearms incidents per head in Birmingham than anywhere else in the UK
Test 1: What is the rank of Birmingham?
20122014201620182020020406080100
variableFIRE_ARM_PER_HEAD_RANKCRIMES_RANKFig 9. Position of Birmingham across yearsYEARvalue
plotly-logomark
What are the top 20 Areas involving firearms incidents?
Top 20 FIRE_ARM_PER_HEAD_RANK Based on Rank
  AREA_NAME
FIRE_ARM_PER_HEAD_RANK 1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000 9.000000 10.000000 11.000000 12.000000 13.000000 14.000000 15.000000 16.000000 17.000000 18.000000 19.000000 20.000000
YEAR                                        
2011 City of London Westminster Camden Islington Exeter Ipswich Hammersmith and Fulham Lincoln Uttlesford Kingston upon Hull Manchester Southampton Brent Worcester Tower Hamlets Ealing Nottingham Portsmouth Lambeth Lewisham
2012 City of London Westminster Camden Islington Hammersmith and Fulham Ipswich Exeter Lincoln Worcester Uttlesford Kingston upon Hull Lambeth Manchester Southwark Tower Hamlets Brent Liverpool Middlesbrough Oxford Crawley
2013 City of London Uttlesford Westminster Hammersmith and Fulham Camden Islington Lincoln Kingston upon Hull Oxford Middlesbrough Exeter Merthyr Tydfil Nottingham Lambeth Southwark Tower Hamlets Liverpool Ipswich Plymouth Hackney
2014 City of London Uttlesford Crawley Westminster Boston Nottingham Hillingdon Manchester Stevenage Luton Lambeth Stoke-on-Trent Hastings Northampton Newcastle upon Tyne Liverpool Corby Islington Lincoln Brighton and Hove
2015 City of London Uttlesford Crawley Nottingham Boston Westminster Broxbourne Watford Manchester Stevenage Newcastle upon Tyne Lambeth Hastings Corby Lincoln Luton Southampton Kensington and Chelsea Ipswich Mansfield
2016 City of London Uttlesford Stevenage Watford Broxbourne Hertsmere Dacorum Welwyn Hatfield Crawley East Hertfordshire North Hertfordshire St Albans Three Rivers Westminster Hastings Nottingham Windsor and Maidenhead Manchester Southampton Brighton and Hove
2017 Uttlesford City of London Stevenage Watford Welwyn Hatfield Broxbourne Dacorum Hertsmere North Hertfordshire St Albans Three Rivers East Hertfordshire Westminster Manchester Crawley Nottingham Lambeth Hastings Shepway Brighton and Hove
2018 Uttlesford City of London Stevenage Manchester Crawley Camden Watford Shepway Westminster Hastings Nottingham Broxbourne Welwyn Hatfield Salford Lincoln Tameside Norwich Leicester Dacorum Newcastle upon Tyne
2019 City of London Crawley Shepway Nottingham Uttlesford Hastings Westminster Lincoln Brighton and Hove Mansfield Southampton Corby Great Yarmouth Luton Portsmouth Derby Norwich Peterborough Ipswich Leicester
2020 City of London Crawley Mansfield Southampton Portsmouth Norwich Lincoln Hastings Blackpool Wellingborough Worthing Middlesbrough Hartlepool Birmingham Nottingham Preston Peterborough Slough Brighton and Hove Liverpool
2021 Mendip Enfield Lewisham Barrow-in-Furness Winchester North Kesteven Christchurch Boston Uttlesford Bexley Southend-on-Sea Dartford Dover Newham Blackpool Cotswold Merton Lancaster Daventry Tandridge
What are the top 20 Areas involving firearms incidents?
Top 20 FIRE_ARM cases Based on CRIMES_RANK
  AREA_NAME
CRIMES_RANK 1.000000 2.000000 3.000000 4.000000 5.000000 6.000000 7.000000 8.000000 9.000000 10.000000 11.000000 12.000000 13.000000 14.000000 15.000000 16.000000 17.000000 18.000000 19.000000 20.000000
YEAR                                        
2011 Unknown Area Name Birmingham Manchester Westminster Liverpool Leeds Camden Ealing Brent Leicester Islington Cardiff Nottingham Lambeth Kingston upon Hull Bradford Southwark Lewisham Southampton Tower Hamlets
2012 Unknown Area Name Birmingham Manchester Westminster Leeds Liverpool Camden Bradford Islington Lambeth Brent Cardiff Southwark Ealing Leicester Kingston upon Hull Tower Hamlets Hammersmith and Fulham Nottingham Croydon
2013 Unknown Area Name Birmingham Liverpool Manchester Westminster Leeds Lambeth Bradford Nottingham Southwark Cardiff Leicester Camden Tower Hamlets Kingston upon Hull Hillingdon Hammersmith and Fulham Cornwall Sheffield Lewisham
2014 Unknown Area Name Birmingham Manchester Uttlesford Sheffield Liverpool Nottingham Leeds Hillingdon Westminster Lambeth Bradford Leicester Newcastle upon Tyne Stoke-on-Trent Brighton and Hove Croydon Sunderland Southwark Luton
2015 Unknown Area Name Birmingham Manchester Nottingham Leeds Sheffield Liverpool Uttlesford Lambeth Bradford Newcastle upon Tyne Crawley Westminster Leicester Hackney Southampton Derby Newham Southwark Sunderland
2016 Unknown Area Name Birmingham Manchester Leeds Sheffield Nottingham Uttlesford Westminster Bradford Liverpool Dacorum Lambeth Stevenage Watford Brighton and Hove Leicester Newcastle upon Tyne Broxbourne Kirklees East Hertfordshire
2017 Unknown Area Name Manchester Birmingham Uttlesford Leeds Sheffield Nottingham Dacorum Lambeth Westminster Welwyn Hatfield Stevenage Liverpool Leicester Bradford Southwark Croydon Brighton and Hove St Albans Newham
2018 Unknown Area Name Manchester Birmingham Leeds Uttlesford Camden Sheffield Nottingham Leicester Bradford Liverpool Westminster Kirklees Southwark Lambeth Newcastle upon Tyne Brighton and Hove Salford Newham Southampton
2019 Unknown Area Name Birmingham Leeds Manchester Sheffield Nottingham Liverpool Bradford Leicester Westminster Kirklees Brighton and Hove Doncaster Newcastle upon Tyne Lambeth Southampton Derby Croydon Newham Southwark
2020 Unknown Area Name Birmingham Leeds Sheffield Liverpool Bradford Nottingham Leicester Southampton Doncaster Brighton and Hove Coventry Kirklees Newcastle upon Tyne Croydon Portsmouth Wolverhampton Plymouth Derby Lambeth
2021 Unknown Area Name Birmingham Leeds Sheffield Nottingham Bradford Liverpool Leicester Doncaster Southampton Derby Kirklees Sandwell Lambeth Bristol Southwark Newcastle upon Tyne Portsmouth Northampton Brighton and Hove

Media Claim #3 "Crimes involving firearms are closely associated with drugs offences"

To prove the claim,

  1. weapon_and_drugs = (CRIME_TYPE='Possession of weapons' | CRIME_TYPE='Public disorder and weapons' | CRIME_TYPE='Drugs')
  2. Combine 'Possession of weapons' + 'Public disorder and weapons' together as 'WEAPON_CRIME'
  3. Visualize trend of weapons & drugs
  4. Statistical test: Correlation
    • closer to 1 or -1 means weapons/drugs are always related with drug/weapons crimes
    • Equal to 0 means weapons are not related with drug crimes
    • For the test, we will use >0.7 and <-0.7 correlation threshold to conclude if weapons & drugs are correlated

Observations

  • Trend of weapon crimes did not change much, it is flat and did not shift its trend much
  • Trend of drugs starts going down in 2014 till 2018
  • By looking at the correlation score (0.54) it is correlated but not enough to conclude the statement "Drugs are associated with Weapons" as it did not cross (0.7) correlation threshold
Trend of Weapons & Drug related crimes
201220142016201820205k10k15k20k
CRIME_TYPEDrugsWEAPON_CRIMEFig 10. Trend of Weapon vs DrugsMONTHcount
plotly-logomark
Overall Correlation Check between Weapons & Drug crimes
Checking If there is a correlation between Weapons & Drugs Involvement
DRUG_CRIME_COUNT WEAPON_CRIME_COUNT
DRUG_CRIME_COUNT 1.000000 0.541564
WEAPON_CRIME_COUNT 0.541564 1.000000
Trend by Area Name | Weapons & Drug related crimes
Trend of Weapons & Drug related crimes by LSOA Area
2012201420162018202020040060080010001200
CRIME_TYPEDrugsWEAPON_CRIMEAREA_NAME=NullNullHaltonSheffieldFig 11. Trend of Weapon vs Drugs by LSOA AREAMONTHcount▶◼
plotly-logomark
count WEAPON_CRIMEcount DrugsCombined020k40k60k80k100k120k
Fig 12. Across Months Distibution of Crimes: Weapon, Drug and bothvariablevalue
plotly-logomark
Correlation for Area Name: Birmingham
count Drugs count WEAPON_CRIME
count Drugs 1.000000 0.745904
count WEAPON_CRIME 0.745904 1.000000
Correlation for Area Name: East Riding of Yorkshire
count Drugs count WEAPON_CRIME
count Drugs 1.000000 0.723121
count WEAPON_CRIME 0.723121 1.000000
Correlation for Area Name: Manchester
count Drugs count WEAPON_CRIME
count Drugs 1.000000 0.733751
count WEAPON_CRIME 0.733751 1.000000
Correlation for Area Name: Salford
count Drugs count WEAPON_CRIME
count Drugs 1.000000 0.708419
count WEAPON_CRIME 0.708419 1.000000
Correlation for Area Name: Stockport
count Drugs count WEAPON_CRIME
count Drugs 1.000000 0.701101
count WEAPON_CRIME 0.701101 1.000000

Observations

  • There are 5 Areas which have correlation >0.7 indicating areas having weapons are involved in drug related:
    • Areas: Birmingham, East Riding of Yorkshire, Manchester, Salford, Stockport
201220142016201820200100200300400
variablecount WEAPON_CRIMEcount DrugsAREA_NAME=BirminghamBirminghamManchesterStockportFig 13. Trend of Weapon vs Drugs by Area Name having higher correlationMONTHvalue▶◼
plotly-logomark

Burglary Protection

Client: Insurance company developing a highly segmented home insurance product

Claims to prove:

  1. There are more burglaries in more affulent areas
  2. Burglaries are increasing, decreasing, or are stable

To prove the claims:

1. There are more burglaries in more affulent areas

  1. Approach: Rank the Area names based on total #of Burglary crimes over years ## Observations:
     1. Leeds, Birmingham, Sheffield, Bradford, Liverpool, Bristol, Manchester are top areas where the Burglary crimes trends are high compared to other areas
     2. We can create buckets / bins based on the ranks / #of crimes

2. Burglaries are increasing, decreasing, or are stable

  1. Approach: Look at the trend of burglaries over time ## Observations:
    1. Cyclic trend of Burglary crimes is consistent & peaks during end of the year.
    2. Between Jan 2014 and Mar 2019, the trend is ±± 3K crimes from the mean of 35K crimes
    3. Burglary reduced by 70% comparing Jan 2020 and March 2020 due to Covid-19 based on Timeline of UK government coronavirus lockdowns and measures, March 2020 to December 2021 ## Conclusion: #### 1. Forecasting of Burglary crimes are significantly skewed due to Covid-19 effect. #### 2. If we negate the covid-19 period (2019 FY - 2021 FY), Burglary crimes started declining and became stable after Jan 2014. #### 3. Across the timeline with crime peaking during end of Year
  1. Aggregate by area names and look at their trend

Takeaway for the Insurance Company

Use of bucketing technique, we can create clusters containing areas which can be used to set policy price.

Observation:

1. Using binning technique, 10 Burglary crimes can be bucketed into 10 groups based on #of Burglary crimes across years. Looking at the #of areas in the bucket, after 4th bucket we can combine everything into 1 group

Better approach instead of Binning

A better technique which can be used is K-Means Segmentation.

  1. Create various features for the model
  2. Score the areas
  3. Between each model score runs, track how much data points are shifted from 1 segment to another
  4. When the clusters are shifted drastically (say 10%-20% across clusters) then re-run K-Means model again
What are the top 20 Areas involving Burglary incidents?
Top 20 Burglary cases Based on BURGLARY_CRIMES_RANK
  AREA_NAME
BURGLARY_CRIMES_RANK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
YEAR                                        
2010 Leeds Birmingham Bradford Manchester Liverpool Bristol Kirklees Redbridge Barnet Croydon Sheffield Enfield Lewisham Waltham Forest Haringey Brent Sandwell Newham Coventry Ealing
2011 Leeds Birmingham Bradford Manchester Sheffield Kirklees Liverpool Bristol Barnet Croydon Doncaster Kingston upon Hull Unknown Area Name Nottingham Wakefield Ealing Brent Lambeth Redbridge Hillingdon
2012 Unknown Area Name Leeds Birmingham Bradford Manchester Sheffield Liverpool Barnet Bristol Doncaster Kirklees Westminster Croydon Ealing Coventry Brent Cardiff Enfield Kingston upon Hull Lambeth
2013 Unknown Area Name Leeds Birmingham Manchester Bradford Sheffield Liverpool Barnet Kingston upon Hull Kirklees Croydon Westminster Bristol Doncaster Lambeth Ealing Southwark Enfield Coventry Lewisham
2014 Unknown Area Name Leeds Birmingham Manchester Bradford Sheffield Liverpool Bristol Kingston upon Hull Doncaster Barnet Kirklees Nottingham Leicester Westminster Croydon Cardiff Enfield Lambeth Coventry
2015 Unknown Area Name Leeds Birmingham Manchester Bradford Liverpool Sheffield Bristol Kirklees Barnet Kingston upon Hull Westminster County Durham Leicester Southwark Doncaster Coventry Nottingham Cardiff Rochdale
2016 Unknown Area Name Birmingham Leeds Manchester Bradford Liverpool Bristol Kirklees Sheffield Kingston upon Hull Barnet Westminster Leicester County Durham Lambeth Stockport Wigan Southwark Southampton Bolton
2017 Unknown Area Name Birmingham Leeds Manchester Bradford Liverpool Sheffield Bristol County Durham Doncaster Kirklees Leicester Bolton Stockport Wigan Barnet Westminster Coventry Kingston upon Hull Southwark
2018 Unknown Area Name Birmingham Leeds Manchester Bradford Sheffield Liverpool Westminster Kirklees Leicester Bristol Barnet Sandwell Doncaster Camden Coventry County Durham Brent Tower Hamlets Hackney
2019 Unknown Area Name Birmingham Leeds Bradford Sheffield Westminster Doncaster Liverpool Kirklees Barnet County Durham Manchester Camden Hackney Southwark Enfield Lambeth Tower Hamlets Kingston upon Hull Sandwell
2020 Unknown Area Name Birmingham Leeds Bradford Sheffield Liverpool Doncaster Westminster Barnet Tower Hamlets Bristol County Durham Lambeth Hackney Southwark Kingston upon Hull Enfield Lewisham Croydon Sandwell
2021 Unknown Area Name Birmingham Leeds Sheffield Liverpool Bradford Doncaster Hackney Wandsworth Barnet Bristol Tower Hamlets Lewisham Lambeth Southwark Enfield Westminster Ealing Haringey Croydon
What are the top 20 Areas involving Burglary incidents Across Years ?
Top 20 Burglary cases Based on OVERALL_BURGLARY_CRIMES_RANK
  AREA_NAME
OVERALL_BURGLARY_CRIMES_RANK  
1 Unknown Area Name
2 Birmingham
3 Leeds
4 Bradford
5 Manchester
6 Sheffield
7 Liverpool
8 Bristol
9 Kirklees
10 Barnet
11 Doncaster
12 Westminster
13 Kingston upon Hull
14 Croydon
15 Leicester
16 Lambeth
17 Coventry
18 Southwark
19 County Durham
20 Enfield
number_of_areas min_number_of_crime max_number_of_crime
Crime_Bins
Crime_Bin_1 242 51.0 12416.0
Crime_Bin_2 67 12606.0 24812.0
Crime_Bin_3 28 25372.0 35979.0
Crime_Bin_4 5 37566.0 42214.0
Crime_Bin_5 2 50847.0 57860.0
Crime_Bin_6 2 65054.0 68604.0
Crime_Bin_7 0 NaN NaN
Crime_Bin_8 1 98979.0 98979.0
Crime_Bin_9 1 102702.0 102702.0
Crime_Bin_10 1 123986.0 123986.0
Income estimates for small areas, England and Wales: financial year ending 2018
Claim #2. Burglaries are increasing, decreasing, or are stable
Test 1: Check if Burglary Crimes is Increasing
Burglary crime count mean value:  34525.80952380953
2012201420162018202020k25k30k35k40k45k
Fig 14. Claim #1. Burglary Crimes is IncreasingMONTHcount
plotly-logomark
Test 2: Check if "Burglaries are increasing, decreasing, or are stable" by LSOA Area
20122014201620182020020040060080010001200
AREA_NAME=Unknown Area NameUnknown Area NameTendringCotswoldWest DevonFig 15. Claim #2. "Burglaries are increasing, decreasing, or are stable" By LSOA AreaMONTHcount▶◼
plotly-logomark

Using Machine Learning To Forecast Crimes

Using Regression model to forecast crimes related to (Violen crimes, weapon crimes & drug crimes)

20122014201620182020050k100k150k200k250k
CRIME_TYPEAnti-social behaviourOther crimeViolent crimeBurglaryVehicle crimeRobberyOther theftCriminal damage and arsonShopliftingDrugsPublic disorder and weaponsViolence and sexual offencesPublic orderBicycle theftTheft from the personPossession of weaponsFig 16. Trend of All Crime TypesMONTHcount
plotly-logomark

6. Detailed Analysis and consideration of the appropriateness of the solution for the initial problem

Observations:

  • Filter:We are narrowing down to 'Possession of weapons' & 'Drugs' for forecasting
  • Forecast to future: 12 Month window
  • Machine Learning Model: Linear Regression

Model Performance:

  • The regression model was able to predict as close as possible
  • We can also try with various other regressor models like xgboost, catboost, lightgbm and so on.
  • Another approach is to go with Time series models like fbprophet, ARIMA, SARIMA models

Feature engineering:

  • When using Regression model, we need to create lags of target variable.
    • Using lags, we can introduce seasonality in the data.
    • This is one of the way, time series models use to create relationship behind the seasonality pattern

How to use Regression for time series: | Fix: Remove

  1. Select data:
    • Data usually contains 2 columns: timestamp (month) & Target (crimes)
  2. Create features:
    • Using lag, we create a shift up to desired level so they help in seasonality
  3. Drop nulls
  4. Use lags as input for the model
  5. Train the model and predict
  6. Forecasting:
    • Start from known target
    • Once the forecast is done, insert new row and shift lags by 1 so it becomes input for next month
201220132014201520162017201820192020202114k16k18k20k22k24k
Fig 17. Trend of `Possession of weapons` and `Drugs`MONTHNUMBER_OF_CRIMES
plotly-logomark
MSE: 3376915.47
RMSE: 1837.64
R-squared: -0.30
20122014201620182020202214k16k18k20k22k24k
PREDICTED_TYPEPREDICTED_TRAINPREDICTED_TESTPREDICTED_FUTUREACTUALFig 18. Linear Regression Model Prediction vs ACTUALindexPREDICTED
plotly-logomark
'\n\nStopping Services / Memory Clean\n\n'

7. Evaluation and Conclusion

Conclusion of Claims by Media:

  1. Violent Crime is increasing - Media:

    • Conclusion:
      • Violent Crimes are increasing as per the data. The statement is _True_ - Analsyis
      • Based on trend analysis over months & year, we can conclude that violent crimes are increasing
      • As of 2021, violent crimes has increased by 3x of 2013
  2. There are more firearms incidents per head in Birmingham than anywhere else in the UK - Media:

    • Conclusion:

      • We can draw 2 conclusions here,

        1. Based on #of Firearm incidents reported, Birmingham is the 2nd highest reported</trong> & 1st position is crimes reported in unknown location

        2. Firearms incident reports per person in Birmingham is low (Ranking around 100). Hence "More firearms incidents per head in Birmingham" claim is False

  3. Crimes involving firearms are closely associated with drugs offences - Media

    • Conclusion:
      • We _cannot conclude_ if weapons are related with Drug crimes based on score of 0.5
      • Reason: By looking at the correlation score (0.54) it is correlated but not enough to conclude the statement "Drugs are associated with Weapons" as it did not cross (0.7) correlation threshold
      • There are 5 Areas which have correlation >0.7 indicating areas having weapons are involved in drug related:
        • Areas: Birmingham, East Riding of Yorkshire, Manchester, Salford, Stockport </strong>

8. Scientific References and Citation

  • "Crime Detection and Prediction Using Big Data Analytics: A Case Study of Chicago" by Dong Wang, et al., published in IEEE Transactions on Big Data in 2018, https://www.jetir.org/papers/JETIR2107201.pdf

  • "Using Machine Learning to Assist Crime Prevention," 2017 6th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), Hamamatsu, Japan, 2017, pp. 1029-1030, doi: 10.1109/IIAI-AAI.2017.46.

  • Office for National Statistics (ONS), published 5 March 2020, ONS website, statistical bulletin, Income estimates for small areas, England and Wales: financial year ending 2018